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ABSTRACT 

We describe a procedure for background subtracting Sloan Digital Sky Survey (SDSS) imaging 
that improves the resulting detection and photometry of large galaxies on the sky. Within each 
SDSS drift scan run, we mask out detected sources and then fit a smooth function to the variation 
of the sky background. This procedure has been applied to all SDSS-III Data Release 8 images, 
and the results are available as part of that data set. We have tested the effect of our background 
subtraction on the photometry of large galaxies by inserting fake galaxies into the raw pixels, 
reanalyzing the data, and measuring them after background subtraction. Our technique results 
in no size-dependent bias in galaxy fluxes up to half-light radii r^o ~ 100 arcsec; in contrast, for 
galaxies of that size the standard SDSS photometric catalog underestimates fluxes by about 1.5 
mag. Our results represent a substantial improvement over the standard SDSS catalog results 
and should form the basis of any analysis of nearby galaxies using the SDSS imaging data. 



1. Why improve the sky subtraction? 



The Sloan Digital Sky Survey (SDSS: lYork et al.ll2000h is t he largest existing sury e y of the sky to date ; 



the SDSS-III program has now released Data Release 8 (DR8; lEisenstein et all 1201 1[ lAihara et al.ll201lh . 
This release provides reliable, well-calibrated catalogs down to r ~ 22.5 over 14,500 square degrees of sky 
in five bands (u, g, r, i and z). In so doing, it has provided images of a large fraction of the known bright 
galaxies on the sky — the Messier and NGC objects, as well as lower surface brightness nearby galaxies. 
However, no reliable catalog of the properties of these galaxies yet exists, primarily because no standard 
background subtraction procedure has been implemented that properly treats the largest, brightest galaxies 
in the sky. The purpose of this paper is to provide just such a background procedure, which has now been 
run over the entire SDSS imaging dataset, and which is being released as part of DR8. 

Background subtraction of astronomical images is probably a formally impossible task. The ideal 
treatment of the data gathered by a detector would be to explain (at reasonable x 2 ) the counts in each 
pixel, using a physical model of the sky brightness and its gradients, the telescopic optics and scattered light 
properties, the astronomical and other sources of light, and the sensitivity, noise and other properties of the 
detector. However, this task is impractical (and likely intractable as well), given the detail needed in such 
a model, the time variability of conditions and instruments, and the probable necessity of treating all the 
data simultaneously. For this reason, most practical applications of background subtraction tend to focus 
on simple approximations that are tractable as well as close to correct for the problems of interest. 

For t he SDSS, the stand ard photometric pipeline, named photo, takes jus t such a practical and a ccurate 
approach ( Lupton et alJioOl ). In the catalog released in Data Release 7 (DR7; Abazaiian et al.|[2009h . photo 



calculates the median-smoothed background on a scale of 100 x 100 arcsec and subtracts that from the image 
before object detection and measurement. In the catalog release in DR8, a more sophisticated approach is 
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used, which fi rst models the brig htest galaxies in each field so that they do not contaminate the sky estimate 
as much (see lAihara et al.ll201ll for a more complete description). 



For faint point source photometry the above approach is highly accurate (at least, as long as confusion 
is not a problem). In almost all cases, any diffuse light is subtracted away (whatever its source) while the 
point source itself is left untouched. For these sources, it is not necessary to fully model the background — 
just to separate point sources from any diffuse sources. The resulting fluxes of stars are generally repeatable 
at well below the percent level; t hat is, as well as can be e xpected given the vagaries of determining the 
overall calibration of the data set ( Padmanabhan et al. 20081) . 



However, for galaxies the approach that photo uses is more troublesome. Whereas it is appropriate for 
galaxies of small enough angular extent, for larger galaxies it can be inappropria te. Typically, this method 



2005a: 


Hvde & Bernardi 


2009; West et al. 2010) 


brightest cluster galaxies ( 


Bernardi et al. 


2007; 



While the DR8 algorithms represent a 



modest improvement in the sky subtraction, the bulk of the problem remains. 

In $2] below we present a different method for sky subtraction that is more accurate for large galaxies 
(while retaining most of the accuracy for point sources). Our method begins with the estimate of the sky 
background from a processed SDSS imaging run. It builds masks around all bright detected sources, and any 
known sources close to but outside the edge of the imaging run. Then it fits a smooth spline to the unmasked 
data, applying appropriate constraints to regularize the problem in the presence of the heavy masking. 

In §3\ we describe how we take sky subtracted SDSS fields and mosaic them into single image, for the 
purposes of analyzing large galaxies. 

In S|4j we test the resulting photometry of these images in three ways. First, we compare aperture 
photometry of point sources to the standard SDSS aperture photometry, to quantify how much degradation 
our mosaicking procedure and sky subtraction have introduced. Second, we insert fake bright galaxies into 
the raw data and reanalyze the data completely. We then compare our background subtracted mosaicked 
images to the original fake data to evaluate the fidelity of our procedure. We also evaluate the performance 
of two versions of the standard SDSS pipelines (the ones used in SDSS DR7 and DR8). Third, we compare 
to the only other publicly available mosaicking and background subtr action facility for SDSS that we are 
aware of, the Montage tool distributed by IRS A (jBerriman et al.ll2003l ); for bright galaxies we find excellent 
agreement. 

In iJSJ we summarize. The results described here are available on a field-by-field basis in the SDSS DR8 
data setQ Tools for returning mosaics are also available!! Finally, a previous version of the results, for DR7, 
is also still available^ 



1 http : //www . sdss3 . org/dr8 

^http : //data. sdss3 . org/mosaics 

d http : //sdss .physics .nyu . edu/sdss3/mosaic/ 
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2. A model fit to the SDSS sky background 



2.1. The SDSS imaging data 

The SDSS has taken ugriz CCD imaging of 14, 555 deg 2 of the sky dYork et all l200d lAihara et al 



201 lh . Automated software performs all of the data proc essing: astrometry (jPier et al. 2003); source identi 
fication, deblend ing and photometry (ILupton et a 112001); photometricity determin ation (|Hogg et al.ll200lh 
and calibration ( Fukugita et al. 19961: Smith et al. 20021 ; Padmanabhan et al. 2007 ). 



As iGunn et al.l (|2006l ) describe, the SDSS focal plane has 30 imaging CCDs evenly spaced (with gaps) 
across it, six for each of the five filters [u, g, r, i and z). Each exposure is taken in drift scan mode, such 
that a point in the sky passes through each CCD sequentially with a gap of 71.72 seconds on average. This 
procedure produces six long rectangular images of the sky, called "camcols." In the standard SDSS pipeline, 
each camcol is further broken into multiple "fields," each with a width of 1361 pixels in the scan direction. 

The native pixel scale on the CCDs is approximately 0.396 arcsec. When we refer to pixels in this paper 
it is this native scale we will mean. When we discuss these images, we will refer to the position in the scan 
direction as the "pixel row" or y, and the position perpendicular to the scan direction as the "pixel column" 
or x. 

Figure [T] shows this geometry for part of run number 1336 in the r-band, showing each camcol and the 
gaps in between each one. It shows an area about 2.3 deg wide representing about 25 minutes of exposure 
time. This image is the standard SDSS pipeline's estimate of the background light used for photometry, 
binned in 8x8 pixels — that is, detected objects have been removed from this image. In detail, it is 
equal to the actual counts in areas where there were no detected objects, and is equal to the background 
estimate plus appropriate noise in areas where there were detected objects. This determination is made 
before any calibration except for the flat-field correction, which is only a function of pixel column because 
the observations are drift scans. The units are raw counts; because of the varying gains among the CCDs 
two of camcols mostly saturate in this particular image stretch. 

Inspection of Figure [1] reveals that residual light from bright objects is easily visible in the background 
light image. This residual light is due to the SDSS's background light estimation method, which consists of a 
median-smoothed image on a scale of 100 x 100 arcsec. Under this procedure, large objects have a substantial 
amount of their light removed during background subtraction — therefore, the background shows the presence 
of bright stars. We will proceed by fitting a much smoother function to this SDSS sky estimate in order to 
model the obvious variation with the background with time but not oversubtract the brightest objects. 



2.2. Masking the data 

The initial step in our background subtraction is to mask the data appropriately. We cannot allow 
the subtraction to be strongly affected by the presence of bright stars or galaxies, if we want to accurately 
measure the fluxes of those objects. We begin by defining an initial mask around known bright objects. 

First, we identify any objects detected in the SDSS catalog with magnitude m < 15 in the given 
filter. We create a mask of size 32x32 native SDSS pixels around the faintest such objects, growing with 
decreasing magnitude to 1600x1600 pixels at m = 12 (and constant with magnitude for brighter objects). 
This procedure masks successfully finds and masks bright stars inside the imaging run. 
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However, galaxies within the imaging run are frequently shredded into fragments by the SDSS deblender, 
and it is more complex to determine what area to mask based on the SDSS catalog alone. Therefore, we also 



identify any galaxies from the Third Reference Catalog (RC3; Ide Vaucouleurs et al.lll991l ) that land within 
the imaging run and build masks around those objects that are 1000x1000 pixels. This procedure masks a 
number of bright galaxies that are large enough to affect our background subtraction but that our first step 
above would not catch. 

Finally, the SDSS filters suffer from internal reflection at their edges, causing stars just outside the frame 
to scatter light onto the CCD, creating large-scale features that we do not want included in our background 
subtraction model. (A more detailed model would explicitly subtract these features, that we do not attempt 
here) . We mask these reflections by identifying any stars from the Tycho-2 catalog that lie just outside the 
frame and masking a rectangular region 1500x900 pixels centered on the star. We use a rectangular region 
because the reflection has a larger extent in the pixel column direction than in the row direction. 

Occasionally SDSS fields contain objects sufficiently bright that the SDSS pipeline fails to reduce them 
at all. We mask the entirety of those fields. 

Figure [2] shows this initial mask for the run shown in Figure [TJ White areas are those to be used in our 
background model fit while black areas are those to be ignored. The mask is intentionally very conservative 
— we want to minimize the flux that is incorrectly assigned to the sky background. As described in the next 
section, we also apply further masking to the run during our iterative fit. 



2.3. Spline model fit 

In order to fit the data, we first further bin it from 8x8 to 8x680 pixels — that is, we heavily bin it 
in the pixel row direction (i.e. parallel to the scan direction). We keep the resolution in the pixel column 
direction because the sky ends up tracking some residual flat-fielding errors. When rebinning, we account 
for the mask, and track the weight of each 8 x 680 binned pixel as the number of non-masked 8x8 pixels it 
contains. 

The spline model that we fit to the final binned data is as follows: 

/(camcol, x, y) = S'(camcol) x Y(camcoLy) x A" (camcol, x,y), (1) 

where x and y are the pixel column and row positions. S'(camcol) is an overall scaling factor for each camcol. 
Y(camcol, y) is a second-order b-spline for each camcol, with break-points spaced approximately once per 
SDSS field (1361 pixels). Thus, it expresses the overall variation of the sky over time during the scan. 
A (camcol, x, y) is a second-order b-spline in two dimensions, with break-points spaced once every 8 pixels in 
the x direction and spaced once every 40 fields in the y direction. Thus, it allows for a rapid variation with 
column (which accounts for flat-fielding errors) but allows that pattern to vary slowly over time. 

This function is trilinear in the parameters (linear in S and in the b-spline parameters of A and Y); 
thus an iterative fit is required. Our approach here amounts to an approximation to \ 2 minimization of the 
data with respect to the model. 

First, we determine S (camcol) by taking a median across each camcol. 

Second, we divide each camcol by S and perform \ 2 minimization to fit Y alone. We quadratically 
couple the spline parameters strongly in the row direction to keep the fit smooth. In addition, we allow 



- 5 - 



a small quadratic coupling between the camcols in order to interpolate more smoothly over masked data. 
These quadratic couplings retain the linearity of the x 2 fit- 
Third, we divide the data by S and Y, and finally perform a second \ 2 minimization, this time for X. 
Again, we allow coupling between the spline parameters. In this case, we allow weak coupling between the 
parameters in the row direction but very strong coupling in the column direction, to keep X as smooth as it 
can be. In the x 2 minimizations for Y and X , we use the mask weights, but no inverse variance weighting. 

We have a special condition for CCDs that have two amplifiers rather than one. For these CCDs, half 
the pixels are read out by one device and half by another; therefore, we do not couple the pixels across the 
divide, and indeed we observe that the gains of the amplifiers do drift at the fraction of the percent-level 
over the course of each run. 

Finally, given the resulting model /(camcol, x, y), we estimate an rms dispersion a around the fit, and 
then mask all regions that are greater than 2a from the fit (in addition to the initial mask described in £|2.2|) . 
We then iterate the fit four more times. 

An example final fit is shown in Figure[3l This model is non-parametric and not physical — consequently, 
for diffuse sources many arcminutes in size, our method will by design oversubtract the light. However, M4.3I 
below shows that it is sufficiently good to allow accurate measurements of most nearby galaxies. 

Figure 0] shows the residuals of the data after subtracting the model sky. There are remaining dis- 
crepancies visible, which in the next subsection we will show to be at a level that is fainter than 26 mag 
arcsec -2 . 



2.4. Residual tests 

We have performed a simple residual test by considering random unmasked patches that are 13x13 
native SDSS pixels in size. We place these patches in locations that the SDSS photometric pipeline has 
identified as "sky" objects — areas with no detected flux in the pixels. 

Figure [5] shows the distribution of the mean fluxes per square arcsec in such patches across all SDSS 
runs. The performance in most areas is excellent, with a standard deviation representing residuals fainter 
than 26 mag arcsec~ 2 in the r-band. There are a handful of outliers, which we find nearly always to be 
associated with very extended emission from the interstellar medium or an extremely bright star. While in 
regions of very extended emission, we cannot guarantee that our sky estimate is correct, we do expect that 
they will appear as bright outliers in this residual plot. 

A smaller contribution to the dispersion in this plot is variable emission in the night sky lines; this 
variability mainly affects the r, i, and z bands. On some nights, these variations can cause ~ 10% fluctuations 
in the broad-band sky level on time scales of 5 to 10 minutes. Usually our sky subtraction methods remove 
this variation well, but in regions which also have very bright objects, occasionally there are residuals at the 
few percent level. In large-scale images, these may show up as red, low surface brightness streaks. 

These residual tests are adequate insofar as they test the performance of our fitting procedure and help 
to identify runs with highly variable backgrounds (astronomical or instrumental). However, they do not 
provide a direct estimate of the effects of sky subtraction errors on the resulting galaxy photometry — that 
is, they do not test whether we might be oversubtracting light near large sources. We test that effect more 
thoroughly in 34.31 
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2.5. Distribution of DR8 sky subtracted fields and mosaics 

The SDSS DR8 data release distributes all of the flat-fielded, sky subtracted, calibrated fields in each 
band based on this procedure. This data consists of approximately 940,000 fields, each 2048 xf 489 pixels in 
size, available for download on an individual field basis (or requesting cutouts) from the SDSS-III Science 
Archive Server Q 

The data format is FITS. Each file's primary HDU contains a floating-point image, calibrated in units 
of nanomaggies per pixel. The images have had defects such as bad columns and cosmic rays identified and 
interpolated over by the photo pipeline. 

The images are compressed by zeroing out all but the 10 most significant binary digits in the floating 
point numbers, and subsequently using bzip2 to losslessly compress the result. For the sky subtracted 
images, the 10th binary digit is considerably smaller than the noise in any image, meaning that although 
overall this compres sion is slightly lossy, it has no significant effect on the results — even if these images are 
combined in a stack |Price- Whelanfc Ho"^l2010h . Subsequent HDUs in each image yield the sky subtraction 
and calibration image, so that the sky can be added back in and the image put back into counts if desired. 
Full details of the format and location of the files can be found in the SDSS-III data model! 

The Science Archive Server also provides mosaicking tools. They work slightly differently than the 
procedure described in Ej3j Instead of using the raw data, they begin with the corrected frames described 
here and mosaic the fields together using the swarp package! However, they yield equivalent results. 



3. Mosaicking the SDSS 

3.1. Generating mosaics 

The primary aim of our sky subtraction is to improve measurements of large objects on the sky. Many 
such objects overlap the edges of SDSS fields (and some are bigger than or comparable to the size of a single 
field). Thus, to recover their photometry we must mosaic together multiple fields after the background 
subtraction. Here we describe our procedure. For historical reasons, for these mosaics we have not begun 
with the calibrated frames described in ^12.51 but with the raw data. 



For each mosaic we specify a desired World Coordinate System header (WCS; iGreisen k, Calabretta 



20021 ). We identify all SDSS fields that overlap the desired area and pick the minimal set that fully cover 



that area. 



Then we prepare each field for mosaicking . Starting with the raw SP SS data, we apply the flat field 



determined by the ubercalibration procedure (jPadmanabhan et al. 2007). Next w e identify cosmic rays 
using a procedure similar to that used by the SDSS pipeline (jLupton et al. 2001 ). We interpolate over 



saturated pixels and cosmic rays using simple linear interpolation in the x direction. We then subtract our 
estimate the sky, determined by evaluating the function described by Equation [T] at each image pixel. In 
the sky estimate, we account for any difference in the flat field as originally used by the SDSS photometric 
pipeline, and as determined by ubercalibration. Finally, we apply the photometric calibration to each field 



4 http : //data . sdss3 . org/ 

5 http : //data. sdss3 . org/datamodel/f iles/B0SS_PH0T00B J/f rames/RERUN/RUN/CAMCOL/f rame . html 
e http : //www. astromatic .net 
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( Padmanabhan et al.l l2007). 



To resample each image we evaluate the position of each desired pixel within each original field. Then we 
interpolate the origi nal field's image to the desire d set of locations, using the well-known cubic approximation 
to the sine function ( Park fc Schowenger dt 1983). This resampling method is known to work well for Nyquist- 



sampled images, a condition SDSS virtually always satisfies. In practice, we use the built-in IDL utilities 
polywarp and poly_2d to perform this resampling. 

We evaluate the weighted average of the flux from all the input images at each output pixel. The weights 
are unity throughout most of each input image, but are apodized smoothly to zero at the edges. Thus, the 
transition between regions which overlap two fields are relatively smooth. Any output pixels that have no 
overlapping input images are set to zero. 



3.2. Example mosaics 

Figure [5] shows several example mosaics created with this procedure. We have obviously chosen these 
partly for dramatic effect, but also to demonstrate the robustness of the masking to the presence of large, 
bright objects of various sorts: globular clusters, galaxy clusters, and large galaxies. Each of these mosaics 
is built from two up to six fields depending on its size (which ranges from 12 to 24 arcmin on a side). 



3.3. Distribution of DR7 mosaics 

Although our procedures allow us to evaluate any arbitrary mosaic, for ease of distribution we have 
created a set of lxl deg mosaics across the entire observed Northern Galactic Cap, or almost 8000 deg 2 . We 
have created a web interface that allows users to easily extract sections of these DR7 mosaics (see footnote 
in JQ for URL). 

This interface can be used either interactively or non-interactively. The interactive method allows the 
user to browse a Google sky viewer and select the region of interest. The non- interactive method is explained 
in its documentation. In either case, the tool returns FITS format images in each desired bandpass, with 
correct WCS headers, sky subtracted and calibrated. The units of the images are nanomaggies per pixel. 

Note that we have not yet produced any such mosaics of the area imaged in SDSS-IIFs new imaging 
in the South Galactic Cap. However, as described in £ 12.51 the standard DR8 data release tools do provide 
mosaicking tools. 



3.4. Comparison to Montage 



The only other publicly available mosaicking and background su btraction facility for SPSS th at we are 

We will 



200 



1 l2004h . 



aware of is the results of the Montage package distributed by IRSA (jBerriman et al. 
describe the quantitative differences between their results and ours in ^4.61 Here we discuss methodological 
differences. 



The first major difference is in their method for background subtraction. While in our case we rely on a 
smooth fit to regions with no objects, Montage fits a smooth additive term within each run to minimize the 
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differences between it and other overlapping images. This procedure can be far superior to ours in regions 
with many overlapping images. In much of the SDSS area the only overlapping images are at the north and 
south edges of each field, and how the Montage algorithm behaves in that regime is a quantitative question. 
We compare our results to theirs for large galaxies in H4.3I below. 

The second major difference is in their method for "resampling" of images. They do not perform a 
normal resampling but instead use a generalized version of the "drizzle" algorithm. This algorithm computes 
the fraction of each original pixel that overlaps each output pixel, and distributes the original pixel's flux 
accordingly. Relative to properly resampling a Nyquist-sampled image, such methods do a poor job at 
maintaining the original PSF (or even at producing an image with a well-defined PSF). They are also clearly 
irreversible operations even in the noiseless case, demonstrating that information is lost and the image is 
consequently degraded. We compare our results to the Montage results for point sources in §4.11 below, 
and believe that Montage's are less good at the bright end in those tests due to their drizzling procedure. 
However, it is unlikely that these issues affect our main interest here, which is large galaxies on the sky. 

4. End-to-end tests of the sky background 

The residual tests of §2.41 are acceptable as far as they go, but are of little use in evaluating the quality 
of the photometry of objects in the masked regions. In order to do that, we need to compare to ground truth 
(possible to some extent for point sources; see or simulate the analysis of objects similar to those we 

are interested in (necessary for galaxies; see £14.31 and 34 .4[) . 

4.1. PSF photometry 

The standard SDSS analysis excels for point sources. In fact, because that analysis avoids interpolation, 
regridding or stacking of the data we expect that its performance will be superior than anything that our 
background subtracted mosaics could produce. Therefore, as a quality assurance tool to demonstrate how 
photometric our images are, we compare aperture photometry of point sources in our mosaics to that reported 
by the standard SDSS pipeline. The goal here is to verify that the overall scale of the mosaicked images is 
correct and that the signal-to-noise has not been degraded. 

We randomly choose twenty 1 degree square mosaics. Because in this section we are interested in point 
source photometry, we perform a local sky subtraction on the same scales as SDSS uses (100 arcsec median 
smoothed). We of course will not do the same in the analysis of large galaxies. From the overlapping SDSS 
catalog, we choose unsaturated stars with no neighbors within 15 arcsec. For each we measure a circular 
aperture flux within a 7.3 arcsec radius (the standard aperture used by the ubercalibration pipline). Then, 
we compare this flux to that reported within the same aperture by the standard SDSS pipeline. 

Figure [7] shows (as a function of magnitude) the flux ratios between the 7.3 arcsec aperture flux measured 
from a random set of twenty of our mosaic images and that reported by the SDSS catalog for each SDSS 
band. The overall scale of the magnitudes is close to correct, with less than a 1% difference between our 
results and SDSS at bright magnitudes. 

Figure |8] shows the distribution of flux differences, scaled to the expected error distribution. The lines 
are the 16%, 50% and 84% quantiles. This distribution is close to the normal distribution, an indication 
that our mosaicking procedure does not degrade the image very badly. Of course it does degrade it to some 
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degree, an effect particular fy noticeable at the faintest magnitudes, where the median difference becomes 
almost 0.5er and the distribution is significantly wider then the normal distribution. 

We perform the same pair of tests on twenty Montage mosaics of the same regions; results are shown 
in Figures [9] and [101 The flux ratios reveal a scale difference between the Montage images and the SDSS 
catalog of 3-5%. We have not investigated their procedure thoroughly enough to diagnose the cause of this 
error. In any case, as an overall flux error this problem is relatively minor for our purposes here. 

Figure [TU] shows the distribution of flux differences between the SDSS catalog photometry, and our 
aperture photometry of the Montage mosaics. We have first accounted for the scale error before considering 
the flux differences. Having done so, the error distribution clusters around zero. At faint magnitudes the 
distribution is similar to ours. At bright magnitudes it is considerably broader, which we suspect is due to 
the degradation of the image caused by the drizzling procedure. As we show below, these errors appear to 
be less important when considering larger, brighter objects. In addition, the additional error is extremely 
small, at the 1% level. 



4.2. Galaxy photometry: simulated galaxy samples 

In the previous section, we considered the effects of our procedures on point source photometry to 
verify that our images did reasonably well relative to the "ground truth" established by the SDSS standard 
photometry. We do not trust the SDSS standard results in the case of galaxy photometry, — in fact, it is our 
goal to improve that photometry substantially. Therefore, we must use a different tactic to test our results. 



To do so, we insert fake galaxies into the raw data, using the same tools developed b y Blanton ct aL. 
(2005a) and used by several other investigators since (jBlanton et al .l2005rJ: iMandelbaum et al . 200G; Vlasi edi et al 



2006). We distribute fake galaxy images onto a random set of locations on the sky covered by SDSS fields. 
For each observation of an object in an SDSS field in each band, we convert the fake stamps to SDSS raw 
data units, convolve with the estimated seeing from the SDSS photometric pipeline, and add Poisson noise 
using the estimates of the gain. We add the resulting image to the real SDSS raw data, including the tiny 
effects of nonlinearity in the response and the less tiny flat-field variation as a function of column on the 
chip. We then run the SDSS photometric pipeline photo as well as our sky subtraction procedure. Around 
each fake galaxy that we insert, we create a mosaic using the tools described above. 

The fake gal axy sample is based o n the low redshift sample from the New York University Va lue-Added 



Gala xy Catalog (jBlanton et al.ll2005d) . as updated for SDSS Data Release 6 (see, for example, IZhu et al 
20101 ) . This sample consists of essentially all galaxies with r < 17.77 and z < 0.05. For each galaxy, we take 
its flux, half-light radius, Sersic index, axis ratio, and ugriz colors from the SDSS data. We then reassign 
each galaxy a redshift in the range 0.0033 < z < 0.025 (constrained to be smaller than the actual redshift), 
and adjust its size and flux accordingly. This procedure yields a realistic distribution of sizes, fluxes, profile 
shapes and colors, with an emphasis on the large, nearby galaxies of most interest here. 

There are several different definitions of magnitude in the SDSS catalogs. Here we will compare with the 
most robust, the so-called "cModel" fluxes, defined in terms of the best fit combination of the de Vaucouleurs 
and exponential models as follows: 

fc = /oxp (1 - f racdeV) + / dcV X f racdeV (2) 

Figure QT] shows the distribution as a function of measured cModel magnitude and half-light size (from the 
best- fit model in each case). The greyscale and contours are from the SDSS DR8 catalog; the points are the 
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fake data (with parameters as measured by photo v5_6). The fake galaxies follow the real distribution well, 
though they define a roughly size limited sample with r$o > 5 arcsec. 



4.3. Galaxy photometry: isolating the effect of the background 

Even with background subtracted images, galaxy photometry can be a complex process, involving 
deblending the target galaxy from other sources and a choice of algorithms for estimating the total flux and 
other properties. To test our procedure, we want first to be able to isolate the effects of the background 
subtraction and ask whether it is substantially affecting the results. 

To test how well our sky subtraction is doing, we ask directly how much flux it removes from the aperture 
surrounding the object. In particular, we use an aperture with a radius of rgs, containing 95% of the light 
of the galaxy. We use the fake, sky subtracted field and calculate the flux in each band in that aperture. We 
then use the real, galaxy-free, sky subtracted field and calculate the flux in the same aperture. Of course, in 
the fake data the sky subtraction will inevitably subtract off a slightly different (usually larger) amount of 
light than does the sky subtraction in the real data, because of the presence of the large, fake object. The 
fake aperture flux minus the real aperture flux is equal to the fake galaxy flux that remains in the image 
after sky subtraction — the measured fake galaxy flux / moa s, to be compared with the original input fake 
galaxy flux / truo . 

In Figure [T2l the squares in the bottom panel are Am = — 2.5 log 10 (/ meas //t rue ), the residual of the 
measured fake galaxy flux with respect to the original in the r-band as a function of the true galaxy half-light 
radius r$Q. There is no systematic offset of any signficance up to r§Q ~ 100 arcsec, which indicates a rather 
large galaxy. For example, only a couple of dozen galaxies in the DR8 area are that large. The largest 
galaxies get to r$o = 10-20 hr 1 kpc in size, and such galaxies would have r$o = 100 arcsec at a distance of 
20-40 h~ x Mpc. Thus, we can expect only the largest galaxies on the sky — for example, certain Messier 
objects — to have any substantial flux subtracted by this procedure. 

Figure [13] shows the color residuals with respect to size, in u — r, g — r, r — i, and i — z (measured color 
minus true color). Again, the residuals scatter about zero. 

The test performed here will typically underestimate the errors of any actual photometric measurement 
on the fake galaxies, and defines the "best possible" result given our sky subtraction procedure. These results 
confirm that our procedure works very well for the photometry of bright galaxies. 



4.4. Galaxy photometry: full tests of deblended galaxies 



To test more realistic scenarios, Figures [T2l and fl~3l also show results from an automated analysis of these 
mosaics, performed without knowledge of the true image or of the original image without the fake galaxy. 
We will compare these results to the standard SDSS catalog analysis. 

We have tested two versions of photo: the one used in the DR7 catalog (v5_4) and the one used in 
the DR8 catalog (v5_6). Th e v5_6 version was designed to perform sky subtraction better than v5_4. As 
found bv lAihara et al.l (|201lf) . in practice v5_6 is a minor improvement, but the differences between the two 
pipelines are small. 



In these figures, the circles are the SDSS photo v5_6 measurements of the fake galaxies, using the cModel 
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magnitude. The top panel of Figure [12] shows the error in the galaxy half-light radius in the r-band. There 
is a large underestimation of the fluxes and radii — it becomes signficant at r§o > 10 arcsec. The thick solid 
line running through the points is a running median estimate of the residual (inside a window of ±0.15 dex). 
The thin solid line is a quadratic approximation to the running median of the form: 

a Q + a, log 10 (r 50 /10") + a 2 [log 10 (r 50 /10")] 2 (3) 

Table [T] lists the parameters of this fit as well as the standard deviation a a around the median for both 
versions of photo. 

The errors in color shown in Figure [13] are much smaller, though there are detectable trends in u and 
z. Table Q] also lists the fits to the color residuals. 

We also investigated t hese residua l s as a function of galaxy area (that is, 7rrf (6/a), where b/a is the 
axis ratio), as advocated by West et al. ( 2010h . There was little substantial reduction in the scatter in these 



residuals, and qualitatively the results were very similar. 

We emphasize that for the primary SDSS spectroscopic sample at z ~ 0.1 the issue is not nearly as dire 
as it appears in this plot — only the largest such galaxies have r$o ~ 10-20 arcsec, for which the effects are 
still relatively small. 

For comparison, we have performed deblending and photometry on our sky subtracted images. Our 
algorithms are similar to those used in photo, but are tuned to work for large galaxies at the expense of 
small galaxies. Essentially, we detect pixels 5cr above background and connect those pixels into "parent" 
objects. We find peaks within each parent and remove those that are consistent with the image point-spread 
function. We then smooth the remaining image using a Gaussian with a = 4 pixels and redetect peaks: this 
step will retain large galaxies without breaking them up into many children, but essentially ignore small 
ones. Then we define a symmetric template around each galaxy peak and extract the flux in each pixel 
associated with each peak in the same manner that photo does. To make the procedure more robust, we 
detect galaxies and define templates only in the r-band, and then use those same templates for all bands. 
Finally, we fit a two-dimensional Sersic profile to the deblended galaxy image to estimate its half-light radius 
and flux in each band. 

The triangles in Figures [T2l and IT51 show the results of this procedure. Again, the lines show the running 
median and the quadratic fit to it (with parameters listed in Table [T] in the row marked "global"). This 
analysis performs substantially better than photo: the size errors exist but are much smaller, while the flux 
errors are about 0.1 mag independent of size. In addition, the scatter a a is about a factor of two smaller, 
which probably results from a smaller dependence of the errors on the Sersic index and axis ratio of the 
galaxies. 



4.5. Comparison to previous results 



Hvde & Bernardil |2009h found similar results to what we find here, but for elliptical galaxies. For 
comparison to their results, Figure [Ml shows the magnitude errors of photo v5_6 in the r-band, as a function 
of the measured r$Q (not the true value). Again, the lines show the running m edian and the fit. The dotted 
line shows the parametrization of the error given bv iHvde fc Bernardil (|2009h . At small sizes it is in good 
agreement with our results, but at large sizes it exceeds our corrections. Almost certainly this is because 
our result averages over all galaxy profiles. The solid points in Figure IT4"1 indic ate fake galaxies with Sersic 
index n > 3.5, which follow much better the results of 



Hvde fc Bernardif([2009h . 
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West et al.l ([20101 ) also explored these issues, but for a broader distribution of galaxy types. They 
published a fitting function based on the SDSS catalog axis ratios, 90% light radii, and magnitudes (their 
Equation 1). They predict the flux "lost" in the standard reductions ,fi os t = .fmeas ~ /true (in nanomaggies ). In 
this context, both / 



and /true are standard SDSS Petrosian fluxes (jBlanton et al 



20011: IStoughton et al 



20021) . IWest et al.l (|2010f ) suggest the simple form: 

logio /lost = b + h log 10 m r + b 2 log 10 r go + b 3 log 10 b/a, 



(4) 



where m r , rgo and b/a are the Petrosian quantities reported by the SDSS catalog. We have refit the 
parameters of this model to our simulated data. Table [2] lists the values of the parameters of this fit, for both 
versions of photo studied here, and evaluating the flux lost in both the Petrosian and cModel estimates. 

We demonstrate the quality of fit for this formula in Figure [15] showing the actual flux lost relative 
to the predicted value. The open circles show results for the DR7 version of the photometry (photo v5_4) 
whereas the filled circles show results for the DR8 version of the photometry (photo v5_6). The predictions 
perform excellently on DR7 data. However, for DR8 data there is greater scatter. We find a correlation 
between the scatter and m r that the simple function above fails to explain; in particular, the outliers below 
the relation are typically galaxies with m r < 1 5. Thes e differences are lik ely due to the DR8 photometry 
being slightly better in the large size, high flux regime (jAihara et al.ll201ll ). 

The coefficients for Equation 0] derived by West et al. ( 2010l) are not directly comparable to those in 
Table [2] Their comparison only measured the flux lost due to sky subtraction alone, and did not include 
the effects of photo's deblending. Our comparison is to the photo catalog itself and therefore includes the 
effects of deblending. It turns out that in the SDSS pipeline results, a substantial function of the lost flux 
is due to effects of poor deblending. Therefore, the coefficients found here would be most applicable to an 
actual SDSS photometric catalog sample. 

However, we do not necessarily recommend using these formulae for robust determinations of fluxes for 
bright and/or large galaxies — reanalyzing properly background subtracted images is clearly a better route. 
However, they are excellent ways to determine when such effects are important in the SDSS catalog. 



4.6. Direct comparison to Montage 

As another test, we have also taken a sample of nearby bright galaxies and compared large- aperture 
photometry between our images and images from Montage. We randomly select 200 galaxies brighter than 
r = 15 to perform this test. We compare aperture photometry within a radius of 1.3rgo for each galaxy. 
In general, the fluxes and colors are in good agreement. The median ratio of the flux estimated from the 
Montage images to our flux estimate to is about 96%, as for the point source estimates above (and likely for 
the same reason). The standard deviations around this mean for the (u, g,r,i, z) bands are (5,2,1,2,3)%, 
indicating very good agreement between the methods. 



5. Summary 

In this paper, we have presented a method to estimate the sky background in SDSS images, suitable 
the purposes of measuring large, bright galaxies. While this model has no physical motivation, it appears to 
have the right balance of freedom and physical constraints to adequately describe most of the SDSS imaging. 
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We have tested the measurements of point sources in the sky subtracted images, to ensure that they are 
not signficantly degraded with respect to the original SDSS data. In addition, we have inserted fake galaxies 
into the data and reran the sky subtraction to evaluate how well our procedure works for bright galaxies. 
We find substantial improvement with respect to the standard pipelines. Whereas for the standard SDSS 
photo pipeline the sky subtraction becomes important at the 20% level at r$o > 10 arcsec or so, we do not 
see comparable errors in our results even up to r$o ~ 100 arcsec. 

We have compared our results to a similar service provided by IRSA, the Montage package. Essentially, 
our results are similar to theirs for large objects, though we identify a slight error in their overall fluxes. For 
bright stars their interpolation method appears to introduce some very slight errors. 

These results are now publicly available as part of the SDSS-III DR8 release. Users can acquire individual 
sky subtracted and calibrated fields, as well as request large-scale mosaics in FITS format. 
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Fig. 1. — Data that our sky model is fit to, for part of an SDSS drift scan run. The vertical direction is 
the scan direction (y), the horizontal direction is perpendicular to the scan direction (x). This image is the 
8x8 binned flat- fielded SDSS data. Areas where no objects were detected show the original data. Areas 
where objects were detected are replaced by the background sky estimate plus noise. Each of the six vertical 
stripes represents a "camcol" and the black areas in between are not covered by a CCD. The units of the 
image are raw counts, which therefore reflect the relative gains of the different CCDs. 



Fig. 2. — Mask applied to the data, for the same area of sky as shown in Figure [T] White areas contribute 
to the fit, black areas do not. 



Fig. 3. — Final model sky fit, for the same area of sky as shown in Figure [T] This model represents an 
evaluation of Equation [TJ 




Fig. 4. — The residuals of the data from the model (literally Figure Q] minus Figure [3]). 
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Fig. 5. — Distribution of residuals in each band in areas with no detected objects. In each panel, we list the 
standard deviation a of the residuals in nanomaggies per arcsec -2 , and the fraction of areas that are > 3d 
outliers from the distribution. The smooth line is a Gaussian with a standard deviation a for comparison. 
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Fig. 6. — Some example gri composite images made with the methods described in this paper. Upper left: 
Coma Cluster (18' x 18'); upper right: NGC 5395 (18' x 18'); middle left: Messier 5 (18' x 18'); middle right: 
Messier 101 (21' x 21'); lower left: NGC 4656 (24' x 24'); lower right: UGC 3974 (12' x 12'). 
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Fig. 7. — Flux ratios between the 7.3 arcsec aperture flux measured from our mosaic images, and that 
reported by the SDSS catalog, for each SDSS band. The lines are the 16%, 50% and 84% quantiles. The grey 
scale is proportional to the conditional distribution of the flux ratio on the magnitude. At bright magnitudes, 
the differences are small, indicating that these images have retained their overall photometricity. In each 
panel, we list the mean value and dispersion of the flux ratios in the indicated magnitude range. 
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Fig. 8. — Similar to Figure [3 but now showing flux differences scaled to the SDSS aperture flux uncertainty. 
In the right-hand panels we show the distribution of each set of differences. The smooth Gaussian curve is 
the expected normal distribution. The flux differences are not far removed from the normal distribution, 
indicating that our images are not much degraded relative to the original images. 
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Fig. 9. — Similar to Figure [3 but using Montage images instead of our mosaics. There is an overall scale 
difference in all bands of about 2-3% (not a particularly large error if it is uniform). 
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Fig. 10.— Similar to Figure but using Montage images instead of our mosaics. We account for the overall 
scale difference found in in each band before making these plots. The distribution of differences is much 
broader than the SDSS uncertainties, indicating that for point sources Montage images are significantly 
degraded. 
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Fig. 11. — Distribution of r-band measured half-light sizes and magnitudes for our fake galaxy sample 
(points) and for real SDSS galaxies (greyscale and contours). The two distributions follow each other well, 
but the fake sample is roughly size-limited at r§o ~ 5 arcsec. 
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o Standard DR8 pipeline running median 

A This paper (real deblending) best fit to median 

□ This paper (perfect deblending) 




0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 
lo gio r so (arcsec) 



Fig. 12. — Errors in half-light radii rso (top panel) and magnitudes (bottom panel) in the r-band, as a 
function of true galaxy rso. The open circles show standard SDSS catalog photometry from the DR8 version 
of the pipeline; the DR7 version is similar but slightly worse. The triangles show photometry and deblending 
using the methods described in this paper. The open squares show the photometry of the fake galaxies 
assuming perfect deblending. The thick lines are running medians for each distribution. The thin lines are 
the smooth ht given by Equation [3j with parameters listed in Table [T] 
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o Standard DR8 pipeline running median 

A This paper (real de blending) best fit to median 

□ This paper (perfect deblending) 




0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.00.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 
lo Sio r so (arcsec) log 10 r 50 (arcsec) 



Fig. 13. — Similar to Figure [T2l but showing errors in colors with respect to r, using the u, g, i and z bands. 



-29 - 




Fig. 14. — Similar to Figure 1121 but showing errors r-band magnitude against the measured value of r$o 
rather than the true value. Open circles show galaxies with Sersic indice s n < 3.5, and fill e d circ les show 
galaxies with n > 3.5. The dotted line shows the functional form given bv lHvde fc Bernardil (|2009l ). 
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Fig. 15. — Actua l lost flux compared to that predicted using the functional form in Equation |4] (proposed by 



West et alJl2010l ). with the coefficients from Table 0] We show the actual flux lost (in nanomaggies) in the 



standard SDSS measurements relative to the total Petrosian flux of each fake galaxy, against a prediction 
based on the measured Petrosian magnitude, 90% light radius and axis ratio b/a. The filled circles show the 
results for the DR8 version of the pipeline, and the open circles show the results for the DR7 version of the 
pipeline. 
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Table 1. Fits to residual sizes and fluxes 



Quantity 


Sky version 


«o 


«i 


a 2 






v5_4 


-0.151 


-0.211 


-0.262 


0.134 




v5_6 


-0.146 


-0.193 


-0.269 


0.129 




global 


-0.036 


0.034 


-0.215 


0.086 


m r 


v5_4 


0.168 


0.641 


0.704 


0.173 




v5_6 


0.101 


0.486 


0.861 


0.237 




global 


0.079 


0.032 


-0.001 


0.163 


ill — 7* 1 


v5_4 


0.028 


0.184 


-0.047 


0.147 




v5_6 


-0.008 


0.113 


0.006 


0.198 




global 


0.062 


-0.009 


-0.001 


0.073 


(<? - r) 


v5_4 


-0.008 


-0.007 


0.004 


0.045 




v5_6 


-0.013 


-0.015 


0.010 


0.066 




global 


0.001 


0.002 


-0.005 


0.016 


(r - i) 


v5_4 


-0.017 


-0.025 


0.040 


0.056 




v5_6 


-0.018 


-0.036 


0.026 


0.074 




global 


0.008 


0.002 


-0.003 


0.021 


(r-z) 


v5_4 


-0.054 


-0.129 


0.122 


0.142 




v5_6 


-0.051 


-0.156 


0.046 


0.165 




global 


-0.011 


-0.001 


0.001 


0.053 


m r (vs meas.) 


v5_4 


0.379 


0.959 


0.530 


0.325 




v5_6 


0.293 


0.727 


0.131 


0.290 




global 


0.092 


0.040 


-0.098 


0.177 



Note. — isted parameters refer to Equation|3l Magnitude offsets are in 
units of magnitudes. Half-light radius offsets are in units of dex. Results 
labeled v5_4 are appropriate for the SDSS DR7 catalog. Results labeled 
v5_6 are appropriate for SDSS DR8 catalog. Results labeled "global" are 
appropriate for the sky-subtraction and deblending analysis described in 
this paper. These fits are appropriate between a true f5o of 5 and 100 
arcsec, or in the case of the last three rows a measured r$o of 5 and 40 
arcsec. See 2] for more information. 
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Table 2. Multiparameter fits to lost flux 



Flux type 


photo version 


bo 


bi 


b 2 


h 




Petrosian 


v5_4 


6.359 


-6.609 


3.129 


0.767 


0.291 




v5_6 


4.643 


-4.487 


2.320 


0.614 


0.540 


cModel 


v5_4 


11.784 


-10.470 


2.453 


0.538 


0.357 




v5_6 


9.510 


-8.239 


1.995 


0.362 


0.509 



Note. — The listed parameters refer to Equation |U <7b lists r.m.s. de- 
viation around the model in dcx. 



